Weight-sharing quantization has emerged as a technique to reduce energy expenditure during inference in large neural networks by constraining their weights to a limited set of values.
Concretely,wecontributethefollowing: 1. Wepropose amodel (section 3, Algorithm 1) that can predict aset from afeature vector (vector-to-set) while properly taking the structure of sets into account.